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ABSTRACT 


This report describes the implementation, on a PDP-8 
computer, of a scheme to compress speech by removing its temp- 
oral redundancy. Three features of the speech waveform are 
extracted for every 12.8 msec segment while speech is being 
digitized and stored. A decision making program determines the 
contiguous segments which can be grouped together on the basis 
of these features. Measurements on the waveform are also used 
to identify the type of speech sound represented by each group 
of segments. Based on this information redundant segments are 
found and the output program is instructed to skip these during 
digital to analog conversion. The thresholds for decision making 
are continually adjusted according to the intensity level of the 
utterances. Care is taken to minimize the transients in the 
vicinity of the junctions of the retained segments. With 4K words 
GOL core Capacity and a disk memory of 32 K words, short sentences 
have been compressed. The method of compression developed is 
selective and minimizes processing time on the computer. The result- 


ing compressed speech is fairly intelligible and of good quality. 


TIALS 


— —<—— 


; ; 4 jms Salt ft fivozeh seoqnd e@ndT 





She te Sin od anaise a TE . TS37UGM63 
lee aia VW astetéesi deat! .wanebeuhber deme ot 
Ais yl w nomuse see B.S! yee FOOT Sater Ia - 

- 
.  ,betode Gas foe betgit an 


1 
ee ada L/P Pe gol ed de® pedaw adqgee Selassie : 





7 
: el - A . stuessM .assudsee? seeds Te - 
tj mde Ye sqve al? v2 lonehi of 
+ Pat H ‘) no beacd -eiiearmee 1 ; 
— 
- 
= 1 7u%5enk wf Tepid avesio ofl han baygot : . 
; F 
’ : . ‘ iif at fokessevwioo q0leira oJ Isjlagth -_ 7 
: 
| | <r “a 
14 : gi io oJ-satbiesve bedeotis yi lawniigoe Gam o 
Sot uk « Islan Ts a) otoioh oo welled ef sya). .@eghesesie 
; 7 


brow 4) WaIW - em : buniccoy sd te eroehianuE ads Ge yainiudy . 
es 77singe J inr-: ¢ Etec? d F Yo "7a Gi dekh EB bre os bSaqED S305 to 
al Linq aye!) noleaougi~o ta holtan sf? -hbeeesteuas saad —_ i 


et .yeRigehs sis no Su 4 — eae nd wlio 
ual : 7. = : 
oe ee 7 ee as ix 4 
. iis J eit saad 


7 


a 





ACKNOWLEDGEMENTS 


The author is indebted to Professor Y. J. Kingma, the 
supervisor of this project, for his advice and guidance through- 
OQuEMENeHCOULSe Of this work. 

The author also wishes to acknowledge the helpful sug- 
pestions given by Dr. Murray S. Miron, Dr. Robert J. Scott and 
Dr. Emerson Foulke at the Second Louisville Conference on Rate 
and/or Frequency Controlled Speech held at the University of 
Louisville .sKentucky, U.s.A. in October 1969, where a paper based 
on this work was presented (27). 

Financial assistance received under the Canadian 
Commonwealth Scholarship and Fellowship Plan is gratefully acknow- 


ledged. 


TSH BY 


tl 1_TUN ivy "A 


.¥Y tozesin317 of beidabst el «ahsAue SAT 
bie solvia aii! 763 J sfovq 2if 6 tesivrsqus 
Tow 2ld) 3260 suyuen ef) Juco 


»lLwensnb of fate oels tattiog SAT 


jrsd05 1 .pOUM .c yew .74 Vl navig aintieedg.. 


78740 fivélunw! Goesseé afd je Setivoed noevoes .s0 


TY of) 36 Blatt dassq? beatles Yoneupesd 71o\ fis 
.Pavl rudats0 mi .A. 8 ,vasuteasd ,slilivathal 
(VE) bSansesty ecw strow etd? ao 

“/: tabny bayvtovexs someteiaee Iaienenlt 


zi nell qipewelfat bes qi@leusiods? datesvnoseao 








CHAPTER I 


CHAPTER II 


GHAPTER LET 


GHAR TER Ss LV 


CHAPTER V 


TABLE OF CONTENTS 


INTRODUCTION 


INPUT AND FEATURE EXTRACTION 


DECISION MAKING 


TRANSIENT REMOVAL AND OUTPUT 


CONCLUSIONS 


REFERENCES 


APPENDIX A English Phonemes 


APPENDIX B Source Program Listing 


Page 


13 


30 


36 


39 


42 


43 


3 2! x 


40°F ART 


Mot Paonia 






IT SATEAMNS 


LT) ASTSANS 


Vi SATMAID 


¥ #A74A9 


e 


Vi ~ 





4 7 * 5 


i F TAg Aa eA 
LIST OF TABLES 


Figore Page 
TABLE 1 THRESHOLDS FOR DECISION MAKING 20 
Ll. Syetew biock 4 ‘ , 


de Sound tt va 
}.. Wavefore wepeuetry in yeiced anc 
4, Plow-<¢' 
ervi ti @ 
5 Cor * 


} Lowe . e 
a Or tye ti! l 
= 
g= 
'} 





7 





\ dat 10 Terr - 
SVTSAM @OLT#io8d MO0T ALIONSRANT I ZISBAT 

a 

7 

: 7 

_ 

a 
a1 
a 
a 
a 
a 





10. 


LIST OF ILLUSTRATIONS 


Page 

System block diagram 8 
Sound intensity variations in speech waveform ie 
Waveform asymmetry in voiced and unvoiced sounds T2 
Flow-chart for input and feature extraction 

program 15 
Core usage diagram for input program 16 
Flow-chart for decision making program Zu) 
Core usage diagram for decision making program 24 


Original word, extracted features, and compressed 
word 
Flow-chart of transient removal and output program 


Core usage diagram for output program 


27 


33 


34 


2uaTMe AO ERED 10° Tals 





ox 
re 
2 
eC 
& 
BAS 
e 
~ 


Line , bsOy Deaths 8 
' 
: : 
- F f Eyow . 
3 i scilo-Wela ,o ‘ 





IS5b Seéan, sumo OL a" 





CHAPTER I 


INTRODUCTION 


The importance of spoken language as a means of communica- 
tion in daily life can hardly be over emphasized. As in visual read- 
ing, a significant variable in aural communication is the rate at 
which it occurs. This is of special interest to those who, for one 
reason or another, must depend upon aural communication. 

Time compressed speech is speech which has been reproduced 
in less than the original production time, that is, speech at an in- 
creased rate. Apart from being useful in various educational settings, 
compressed speech may be employed in studying the temporal requirements 
of the listener as he processes spoken language (1), 

The most obvious method of increasing word rate is speaking 
rapidly. This method has serious drawbacks in that the speaker must 
be well trained and even then the rate and the clarity are limited by 
the physical processes involved in speech production. 

Other methods of speech compression take advantage of the 
fact, indicated by early speech research, that much of the natural 
speech signal is redundant (2). It was found that the speech wave 
pattern could be variously distorted without serious losses in the 
best loitigsimce bli ciba lity mule cchers (o)estudied the intelligibility 


of speech compressed by reproducing a tape or record at a speed faster 
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than the one used during recording. Losses in intelligibility were 
found to be small until the speed was 1.4 times that of the original 
speech speed. This method is limited, however, by the accompanying 
distortion due to the frequency shift. 

In 1950, Miller and Licklider (4) showed that speech re- 
mained intelligible if interrupted more than ten times a second until 
about half of the original speech signal had been removed. Based on 
this work Garvey (5) obtained compressed speech by removing the silent 
spaces in the interrupted speech and splicing together the remaining 
segments of the tape record. This 'chop-splice" technique resulted in 
compressed speech with no frequency distortion and with reasonable in- 
telligibility. One year later Fairbanks, et al, (6) described an 
electro-mechanical apparatus for time compression or expansion of 
speech which used the general principle demonstrated by Garvey. Sim- 
ilar approaches had previously been indicated by Gabor (7,8) and others. 
This method of speech compression has come to be known as the sampling 
method since recorded speech is sampled by retaining and discarding 
portions of the speech periodically. It is obvious that the sampling 
method is unselective with respect to the portions of a recorded signal 
that are discarded. There is, therefore, some probability that the 
discarded segments may contain auditory cues essential to perception. 
The probability that an auditory cue may lie entirely within a discarded 
segment decreases as the discard interval is made smaller. However, 
when the discard interval is small the recorded speech has to be 
sampled more often to obtain the same amount of compression, If the 


sampling frequency becomes high enough to be audible it interferes badly 
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with the speech signal. 

Another device for speech compression, the "Harmonic 
Compressor", based on research carried out at the Bell Telephone 
Laboratories, Inc. (9), has been developed at the American Founda- 
tion for the Blind. In this device the speech signal is separated 
into individual harmonic frequency components by an elaborate bank 
of bandpass filters and the frequencies are halved. This half- 
spectrum speech is then resynthesized and recorded. Playing the 
record at twice the speed used during recording restores the frequency 
spectrum, resulting in speech compressed to 50% in time without any 
shift in pitch. A serious limitation of the harmonic compressor is 
that it cannot be adjusted for any desired amount of compression. 
Moreover, unvoiced sounds and noise, which are devoid of the quasi- 
periodicity on which the harmonic compressor is based, are distorted 
to some extent. 

Compressed speech can also be produced by actually synthesiz- 
ing speech at a rate faster than normal (10). This is accomplished 
by recording the control parameters of speech obtained from a vocoder 
analyzer and supplying them at a faster rate to a vocoder synthesizer 
to remake compressed speech. It would also be possible to synthesize 
faster speech by rule on a digital computer. This method has, as yet, 
received little development mainly because it is the most expensive to 
implement. 

Digital computers have been used to compress speech in a 
number of ways. Fairbanks' sampling method can be easily simulated 


on the computer (11), and the durations of the retained and discarded 
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segments can be varied over a wide range. In 1966 Scott (12,11) pro- 
posed the dichotic method of speech compression on a computer. In 
this method, speech compressed by the sampling method is presented to 
one ear and the discard intervals are joined sequentially and present- 
ed to the other ear. Dichotic speech appears to have some advantage 
over speech compressed by the sampling method when reproduced in more 
iianeoOaperecentwor the ori ginalSproductionstime (18)%arThe superiority, 
however, isttoompsmalletotbesom& practical sienificanes (14)¢ 

Another method of compression attempted on the digital 
computer is the pitch period compression of speech (11,12,15). The 
locations of the pitch periods are calculated and a number of pitch 
periods are discarded depending upon the desired compression. Unvoiced 
sounds can be left alone or a discard interval can be arbitrarily 
established such as the average of the detectable periods in the 
immediate vicinity. The quality of speech compressed by this method 
depends on the level of sophistication employed in the pitch period 
detection process (which is known to be a difficult task (2)). The 
cost is enormous, perhaps 300 dollars per minute of original speech (15). 

From this brief review of the methods of speech compression 
it is obvious that the duration of certain phonemes in normal speech 
is longer than required for reliable recognition. In other words parts 
of some of the speech sounds are redundant. The object of speech com- 
pression is to remove this temporal redundancy and thus convey more 
information in less time. Although unselective removal of portions of 
recorded speech results in compressed speech a satisfactory method must 


be selective. Such a method would remove only those parts of the speech 
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Signal which are redundant from the point of view of perception of the 
speech sounds. Although the high cost of computer time on large 
computers is a prohibitive factor in the use of computers for speech 
compression, the demands of a selective method are best met by a 
digital computer. As described earlier the digital computer has been 
used in a number of ways for compressing speech but a selective or 
differential method has not yet been reported in literature (16). 

The object of this project has been to find an economical way 
of compressing speech on a computer, and to develop a selective method 
of compression utilizing the great flexibility offered by a programmed 
data processor. The possibility of compressing speech on a small 
computer, the PDP-8, has been investigated with a view to developing a 
satisfactory method with a minimum of processing time on the computer. 

The first step in selective speech compression is to disting- 
uish between various speech sounds. A number of methods of speech 
segmentation have been developed for the automatic recognition of speech 
(17,18,19). The purpose of the segmentation process in this case, how- 
ever, is to locate redundant parts rather than find sharp boundaries 
between phonemes. A time-domain method similar to (19) is used in 
preference to others (involving comparison of spectral properties) to 
avoid costly hardware or excessive computer time in finding the spectrum. 

The segments of the speech signal are first classified into 
transitional and sustained segments. Sustained segments are those which 
possess certain features of the speech waveform which do not change 
appreciably over the duration of the segment. These segments are also 


tagged as vowel-like, fricative, plosive or silence according to the 
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properties of the speech waveform. Final decisions are then made as 
to what parts of the speech signal can be removed without adversely 
affecting the perceptual cues contained in the signal. The decisions 
also take into consideration the desired amount of compression. The 
remaining speech segments are then abutted in time while carefully 
minimizing the transients at the junctions of the segments. 

The segmentation and classification processes used are an 
attempt at achieving a compromise between sophistication and large 
processing time on the computer. The level of sophistication achieved 
appears to be sufficient for the purpose of the problem at hand. 

On the present set-up with 4 K words of core capacity and a 
disk memory of 32 K words, speech has been processed only a sentence 
at a time. The limitation is due to the small storage capacity of the 
disk. 

All the programming of the PDP-8 computer was done in machine 
language using the mnemonic operation codes. PAL-D Assembler (Program 
Assembly Language for the Disk system) was used to assemble and translate 
the source program statements into the binary codes needed in machine 
instructions. Machine language programming, though cumbersome, was 
chosen to make an efficient use of processing time and the storage space 
available in the computer. 

The whole process of speech compression can be divided into 
three steps: 1) Input and Feature Extraction, 2) Decision Making and 
3) Transient Removal and Output. The following three chapters describe 


each of these steps in some detail. 
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CHAPTER II 


INPUT AND FEATURE EXTRACTION 


Time compression cannot be done in real time, because this 
would amount to predicting what the speaker was about to say. There- 
fore the speech signal to be compressed must be available to the device 
in recorded form regardless of what device is used for compression. In 
the case of a digital computer as a compressor, speech must be fed in 
and stored in a form suitable for use by the computer. 

Fig. 1 is a block diagram representation of the system used 
for this project. The digital computer used is a standard PDP-8 with 
a core capacity of 4 K words of 12 bits each. The teletype and DEC tape 
units connected to the computer were used for software development only. 
The DF 32 disk memory of 32 K words provided the storage for the 
digitized speech and the programs. The interface consisted of REDCOR 
12-bit analog to digital and digital to analog converters and logic 
circuits for timing the operations from an external clock (in this case 
a GR pulse generator). 

Input 

Recorded speech, reproduced at one-half the speed used during 
recording, is band-limited to 2.5 KHz and digitized at 5 KHz to obtain 
an effective sampling rate of 10 KHz. The sampling and digitization is 
carried out by the 12-bit A/D converter. The ordinate of the slowed 


down speech waveform is read every 200 1 secs and quantized to the near- 
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est one of the 4096 (2) 2048 positive and 2048 negative) levels. The 
samples are temporarily stored in core and then swapped on to the disk 
Pyelarce thlockstof «data. 

The speech input takes place under program control. Three 
features of the speech waveform are extracted while speech is being 
digitized and stored. The start of the program is triggered by the 
speech signal from the tape recorder. This synchronization of the 
start of the speech signal and the program allows full use of the stor- 
age capacity of the disk to be made. Any silence interval before the 
speech signal is not digitized or stored. An electronic comparator and 
a skip logic circuit in the interface were used to achieve this control. 

The speech input continues until the disk memory is full 
(approximately three seconds of speech). Control is then transferred 
to a subroutine which reads the decision-making and output programs 
from the disk. 

Feature Extraction 

As a first step in feature extraction the speech wave is divid- 
ed into segments the duration of which Mutcveenis LOaL2+ GamsecstoL 
speech played back at normal speed. The duration of the segment was 
chosen to be large enough to include at least one complete pitch period 
of the heavy male voice and small enough so that significant changes do 
not occur during the segment. The particular value of 12.8 msecs was 
selected for ease of handling segments of 128 samples on the PDP-8 
computer, the core memory of which is organized into pages of 128 words 


each. The following three features are extracted for every segment and 


stored in the computer core: 
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1) The sound intensity 'I' defined as the absolute maximum of 128 samples, 
the samples being the ordinates of the speech waveform at constant inter- 
vals of 100 u secs. If the 128 samples are represented by a vector Y 
then sound intensity 

LE oigtie hes 4i0s)) poe we vii, ane 
5 being the elements of Y. 
2) The waveform asymmetry (20) 'A' defined as the difference between the 
positive maximum and the negative maximum of 128 samples. 

Ao max Vp - max y,, 
where Yp includes all the positive elements of Y 

a includes all the negative elements of Y 

3) The number of zero crossings "%’ in 128 samples. A zero crossing is 
said to occur whenever the sign of the 7th element of Y is different 
EEom the sion of the (t ¢ J)/th element, i-e., if 

Sama eam a his Appeal 

The above characteristics of the speech waveform were chosen 
for the simplicity with which they can be extracted from raw speech. 
They are also quite effective in segmenting and identifying different 
types of sounds. An attempt was first made to compute the power spectra 
of speech at 10 msec intervals using the fast Fourier transform algorithm. 
It was planned to use spectral properties, such as ratio of power in 
different bands, for segmentation. This approach was soon dropped in 
favour of the one described here. The reason was the excessive computer 
time (of the order of 1 sec for 50 msecs of speech) taken in computing 


the spectrum on a small machine like the PDP-8. 



















enfames Git 4 num ko 14 foes 4l) et ise: Aoh "Ty" vi bepesat havon wits a 
Bohl snes hs Vow Asedqe Sdi tu eagnk hao ome antad autqnee ade 7 
7 

iouyv & * 44) ost avs zaionus Eff edgy _et9e i COD is elev 
a 
‘ei iesonnd binge nal? 
FA i. ae P E ~ { v | To = I : 
34 e202 fs ait anied * 
fie ‘at ! Jiepey2h wrhlivey SAE > | 


a 7 
anlsean aviesgsy ado boo dui evheie | 


+? ayy HO %Ss 0 TH. uf acl? {é 
fa fdece 2 sti] teverodw. ass0 od Bie iy 


nai + +) Bid jo fiyie eff ode 


2 ' ‘Z 
byw ~- = ¥ « ~¥ _ 
j ' :~ 4.3 = 
H+ 6% f7*4iysinb3sdo syide Gal i 


“ ad 7 = 4 - “4 
9315 BOTY. ttiesi bee git jismgoe oL syiioeds) Situeene ose. vot . 

- : 
<a. Toe it ns ines t “ 7 
BAssage Asvey, ais sJiscun 3) ghaty 1arkt ase aqgnsainnA, J2bapee te zaqy3 
mo .Megis xsotaieys sciivel, 1282 of! entsu elevasje) Ssem ol me 


BA 340C4 320 'G2355x 2s dove’, vaqoay [eaissqe saa) 6g bonos le « 






9 


me Ops dade gin. noca cow daeoagys. eld ier + 


bea Og i 
: 


ee ae Aen: 


Ss (ik = 


The time-domain approach of segmentation has been used by 
others. Sakai and Doshita (21) used zero-crossing wave analysis and 
information about voicing to separate vowel-like phonemes from others. 
Different vowel-like phonemes were further distinguished by formant- 
Stability criteria obtained from zero-crossing analysis. . Hughes and 
Hemdal (22) used information about voicing, silence and turbulence 
together with the property that semivowels and nasals are less intense 
sounds compared with the neighbouring vowels. In the scheme used by 
Reddy (19) the segmentation is mainly based on the variation or 
stability of sound intensity levels. He uses zero-crossing counts for 
GrrOneCOCreCtLiOn. 

A glance at the speech waveform, Fig. 2, will show that the 
sound intensity, or the amplitude of the envelope, varies considerably 
during vowel-consonant and consonant-vowel transitions. The intensity 
level does not however vary much during the quasi-stable vowel-like 
sounds. This feature of the speech waveform is thus a good measure of 
the stability of vowel-like sounds. The sound intensity can also be 
used for separating semivowels and nasals from vowels because of the 
difference in the intensities of these sounds. Fricatives and plosive 
bursts are the sounds with the lowest amplitude and can thus be dis- 
tinguished from vowel-like sounds. The sound intensity is also an 
indicator of the presence’ or absence of voicing and is effective in 
separating pause intervals from phonation. | 

It is known, and it was verified experimentally, that infinite- 
ly clipped speech, a rectangular zero-crossing wave, is fairly intellig- 
ible. This leads us to believe that much of the information of the 


original speech signal is preserved even after amplitude simplification. 
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FIG. 2) SOUND INTENSITY VARIATIONS IN 
SPEECH WAVEFORM 
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FIG. 3 WAVEFORM ASYMMETRY IN VOICED AND 
UNVOICED SOUNDS 
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This information is contained in the width of each rectangular wave 
between zero-crossings. It was used effectively by Sakai and Doshita 
(21) and others. A simplified characteristic, the zero-crossing count, 
is used here for ease of extraction. Though not as effective as the 
zero-crossing width analysis, when used alone, it does give an indica- 
tion of the frequency content of the speech sound. When used with the 
other two features it helps in distinguishing fricatives and plosive 
bursts from other types of sounds and silence intervals, apart from 
being an additional measure of the stability of any type of sound. 

Waveform asymmetry was first used for identifying and class- 
ifying voiced sounds in a 15-word vocabulary, voice-controlled adding 
machine developed by IBM Corporation, called Shoebox (23). The results 
of more recent work on the effectiveness of asymmetry measurement in 
identifying voiced sounds have been reported by Comer (20). All 
voiced sounds exhibit asymmetry, that is, there is a difference between 
the positive and negative peaks of the speech waveform. Unvoiced sounds, 
however, are composed of nonharmonically related components and are 
symmetrical about the base line. This is illustrated in Fig. 3. The 
value and polarity of the waveform asymmetry also varies for different 
voiced sounds. This characteristic of the speech waveform can thus be 
employed to identify voiced sounds as opposed to unvoiced sounds and to 
segment different voiced sounds. 

In addition to these three features the location of the positive 
peak in each 12-8 msec segment is found and stored in the computer core 
during input. The locations of the positive peaks are later used to 


help remove transients that occur due to the deletion of redundant segments. 
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The programs for each step in the speech compression process 
are designed to work independently. They are initially stored on the 
disk and are called sequentially into the main core memory of the 
computer and executed. The input and feature extraction program consists 
of instructions for A/D conversion, for writing the speech samples on 
the disk memory and for extracting the three features and the loca- 
tions of the positive maxima.| The flow-chart and core usage diagram 
for this program are shown in Fig. 4 and Fig. 5 respectively. The 
program starts reading from the A/D channel as soon as it senses a 
pulse indicating the start of the speech signal. The program loop is 
timed by pulses from the pulse generator so that A/D conversion takes 
place at constant intervals. The digitized speech samples are stored 
in a large area in core and later swapped onto the disk periodically 
without interfering with the A/D conversion. This is possible due to 
the data-break facility, for data transfer to the disk, available in 
the system. The instructions for feature extraction are also present 
in the same loop and the three features plus the location of the positive 
maximum are stored after each segment of 128 words has been read in. 

After modification in only one location the program 1s cap— 
able of sampling at 10 KHz so that slowing the speech signal would not 
be necessary. This will, However, introduce sieht errors in) the sampil-— 
ing interval after each segment and whenever the subroutine for writing 
on the disk is executed. The length of the program loop at these points 
causes the sampling interval to be larger than 100 u secs for one 


sampling period. The effect of the error, however, is negligible. 
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A/D CONVERT & STORE; 
COUNT= COUNT +1 
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FIG. 4 FLOW-CHART FOR INPUT AND FEATURE 
EXTRACTION PROGRAM 


~ @I - 


~. 
‘ 
. 4 
i ‘ys 
i oW 





rer) ThA 
;euod 


he, 
™~ 
ft 
Pa 
ca 
~~ 
%. 
ve? 
| ‘ 
‘Al ‘a l 





| Suite Palare 
lL. AaIQHO 





Ta 


7600 


71200 


6000 


2000 


400 


SEAM Sal 7,0 
MONITOR HEAD 


STORE for LOCATIONS 
of POSITIVE MAXIMA 


DATED CK 
for 


VP eecin) SMe LISS) 


STORE Mor the 
FEATURES 


PROGRAM for INPUT 
ANDIFEATUREIEX GRAGTION 





PETIT RG ES Cob RATIO RITE IRIE SEG 


FIG. 5 CORE USAGE DIAGRAM FOR 
INPUT PROGRAM 
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At the end of the input the next program is automatically read 
into a vacant area in core. Control is then passed on to this program 


for the most crucial step in the compression process: decision making. 
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CHAPTER 111 


DECISION MAKING 


Once speech has been digitized and stored and useful informa- 
tion about the speech waveform has been extracted, decision making is 
the next logical step. In this step decisions, based on the avail- 
able information, are made as to which portions of the speech signal 
can be discarded without adversely affecting its perception. The 
decision making program accepts as input the three features of the 
speech waveform discussed in Chapter II and produces an output indicat- 
ing which particular 12.8 msec segments are to be discarded. 

In the first part of this program each of the three features 
for one segment is compared with those of the contiguous segments. An 
attempt is made to find similar segments which can be grouped together 
to represent a sustained speech sound. This is based on the reason 
that in the time domain the speech waveform can be thought to be com- 
posed of two types of intervals: a) quasi-stable intervals in which 
the parameters remain in an almost constant state; b) transitional 
intervals in which the parameters change gradually except for some time 
points at which parameters change abruptly. Thus the initial segmenta- 
tion procedure involves the location of time intervals during which no 
significant change takes place. 

The values 0G two jot the »three een namely, the "sound 


intensity" and the "waveform asymmetry", depend on the amplitude of the 
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utterance. Fixed thresholds for the comparison of these features would, 
therefore, result in amplitude dependent decisions. The three more 
obvious methods of overcoming this problem are: a) to pass the speech 
signal through an appropriate automatic gain control device before it 

is A/D converted; b) to normalize the amplitude of the digitized samples 
in the digital computer and c) to continually adjust the thresholds 
according to the amplitude of the speech signal. The first of these 
needs extra hard-ware and the second takes excessive computer time. 

The third, though not as sophisticated as the first two, was chosen 
because it is the most economical to implement. The thresholds for 
decision making (Table 1) are set every 128 msecs, their values depend- 
ing upon the intensity level of the utterance during the next 410 msecs. 
The decisions are thus made amplitude insensitive. 

A flow-chart for the program is shown in Fig. 6. The thres- 
holds are first set and a test is carried out to determine whether the 
current and the next segment can be grouped as silence. Two segments 
are grouped as silence or pause if the sound intensity of the mth segment 


I < SILENCE (Table 1) and I < SILENCE and if the number of zero 


vl 
crossings in the nth segment 


Zz < 16 and Zo tuce LO 


tl 
If the test for silence succeeds the group is tagged as "silence" and 
the program proceeds to test the next segment. In this case the sub- 
routines for comparing the three features are bypassed. Care is taken 
to start a mew group of segments whenever there is a transition from a 


segment that is "silence" to one that is "not silence" and vice versa. 


If the test for silence fails the subroutines for comparing 


. . a: 2 ! 
"waveform asymmetry", “sound intensity" and "number of zero crossings 
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are executed in that order. The three subroutines return different 
results to the main control program. These results depend upon whether 
the mth and (y + 1)th segments are similar or unsimilar with respect to 
the particular feature being compared. The similarity measures for the 
three features are listed below: 

1. The (n + 1)th segment is said to be similar in "sound intensity" to 


the mth segment if 
or 


where the tolerance 
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ee being the maximum intensity in the next 410 msecs. 
2. The (n + 1)th segment is said to be similar in "waveform asymmetry" 


to the mth segment if 
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In all the three similarity measures the (n + 2)th segment is 
compared to the nth if similarity between the mth and (n + 1)th is not 
found. This is to take care of the circumstance, though unlikely to 
occur with the segment duration of 12.8 msecs, where the segment lies 
ety ely between two peaks of the speech wave. 

When there are gradual variations in the characteristics in 
one direction an effective reduction in tolerance is needed to avoid 
grouping transitional segments together. This is obtained by using the 
average of the characteristics of all the segments already in the group 


in place of the characteristics of the mth segment in the tests for 


similarity. The average is found in the following manner: 


AVG, = C, 
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where Cl’is the particular charactéristic *forjthetpth segmentuin the 
group. 

The main control program groups the mth and (n + 1)th segments 
together if 

FLAGI + FLAGA + FLAGZ > 3 
where FLAGI, FLAGA and FLAGZ are the output parameters of the subroutines 
for comparing "intensity", "asymmetry" and "zero crossings", respectively. 


The values given to these parameters are: 
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ELAGD = Zit thevsegments are similar in "intensity", 
and FLAGI = 0 otherwise; 
FLAGA = 1 if the segments are similar in "asymmetry", 
and FLAGA = O otherwise; 
FLAGZ = 3 if the segments are grouped together as "fricative", 
FLAGZ = 1 if the segments are similar in "zero crossings" but 
aceéenotm fricative |, 
and FLAGZ = 0 otherwise. 


It may be noticed that the “sound intensity" characteristic 
is generally given more weight. An exception to this rule, however, is 
made when the segments are found to be "fricative". These are noise- 
like segments so that both intensity and the number of zero crossings 
may vary considerably from segment to segment and are not reliable in 
finding similarity. Therefore when two segments are found to be "frica- 
tive'', as defined later, the segments are always grouped together by 
putting FLAGZ = 3. 

As similar segments are grouped together they are also tagged 
as) vowel lake} sidencé!'4<fricative-or=’plosive.') A’ eroup™is tagged 
"vowel-like" if the "sound intensity" of a segment in the group 

ies voreed-unvoiced) threshold Clab lew). 

or ADE PUN 
"silence" if it fulfils the conditions in the test for silence described 
earlier; 
Wertcative weit thessoung mtensity Of the wthesepment 
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FIG. 8 ORIGINAL WORD, EXTRACTED FEATURES, AND 
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"plosive" if it is not any of the cases listed above. It should be 
noted that the classes of sounds are Teta by measurements of certain 
characteristics of the speech waveform and not by phonetic rules. 
Although a realistic classification has been attempted the terms 'vowel- 
like", "fricative" and "plosive" may not strictly correspond to similar 
ones in phonetics, 

After the initial segmentation of speech into groups of seg- 
ments with similar characteristics the various groups are searched for 
redundant segments. The final decisions, as to which segments are to 
be discarded, are based on two factors: the number of 12.8 msec seg- 
ments in the group, and the type of sound the group represents. 

Research on speech perception indicates that only a few com- 
plete pitch periods are sufficient for the recognition of a vowel sound. 
The perception of consonants, fricatives and plosives, however, depends 
on certain acoustic cues (24) such as burst frequency, presence or 
apsence Of friction and voicing. The effect of deleting initial parts 
of consonant-vowel syllables on their perception have been discussed by 
Grimme(25)> He found. that the intelligibility of these syllables falls 
rapidly when the syllable is truncated at the initial end to commence 
50 msec or less before the peak intensity of the vowel of the syllable. 
The perception decreases more rapidly in the case of plosive consonants 
than fricative consonants. A tentative scheme, based on the above 


considerations, has been developed for discarding segments of different 


types of sounds. The rules used are as follows: 
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1. "Vowel-like" sounds: 
No. of segments in the group Rule 
less than 3 Do not discard any segment 
3 Retain one segment at each end 
Qh eye 3) Retain one segment at one end 
and two at the other 
6. /eOGES Retain two segments at each end 
Oeore LO Retain one, discard two, 
Retain three, discard the rest 
retaining two at the end 
11 or greater Retain two, discard two, 
retain three, discard the rest 
retaining two at the end. 
2 Mricatives or “silence : 
No. of segments in the group Rule 
less than 5 Do not discard any segment 
5 to 8 inclusive Retain two segments at each end 
ORL ORO rer UL Retain three segments at each end 
12 or greater Retain four segments at each end. 
Same losivess 


Do not discard any segment. 
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CHAPTER LV 


TRANSIENT REMOVAL AND OUTPUT 


When the redundant segments to be deleted are known the 
problem is how to join the remaining segments together so that there 
are no undesirable transients. 

Three types of transients may occur. First, a local tran- 
slenc, thateis, a discontinuity at the junction of the two retained 
segments. This discontinuity may be in the form of a sudden change 
in amplitude or an undesirable change in slope or a combination of 
the two. Secondly, a change in the pitch of voiced sounds. This can 
eccur in the form of two glottal pulses lying too close or too far 
apart compared with the regular pitch period. Thirdly, a change in the 
intensity of the sound. Although a large difference between the in- 
tensities of the two segments being joined is unlikely, some amplitude 
smoothing in the vicinity of the junction is necessary. 

The scheme for transient removal used here attempts to 
minimize all the three types of irregularities listed above. As men- 
tioned earlier the location of the positive maximum in each segment is 
found and stored during input. Joining the segments at Preece points 
of zero slope will prevent any unwanted change in the slope of the wave- 
form at the junction. The positive peaks of the speech waveform are 


also, in most cases, a fairly good approximation to the locations of 
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the glottal pulses in voiced sounds. Therefore joining the segments 

at their positive peaks will also minimize any change in pitch that 

may otherwise occur. This rule when combined with a rule for smooth- 
ing the amplitude at and in the vicinity of the junction will yield 
the desired result. 

Thus if segments m to m (n <.m) inclusive are to be deleted 
segment n- 1 is joined with segment m + 7 in the following manner: 

1. Let the positive maximum sample in segment m - 1 be the pth sample 
and in segment m + 1 be the gth sample. Then the pth sample is 
followed by the (q + 1)th sample, i.e. samples p + 1 to q are dis- 
carded. 

2. Let the intensity of segment n - 1, as defined in chapter I, be 
I-71 and of segment m+ 71 be L. 


+1° 


Bu) Miike ines > I, then all the samples from p - 63 to p are multi- 
plied by G where G decreases linearly in value from 1 at sample 


p - 68 to approximately 
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at sample p. The approximation is due to the fact that the slope 
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jay ARE Ted > I 7 then all the samples from q + 7 to g + 64 are 


multiplied by G where G increases linearly from 


al Be aah 


at the (q + 1)th sample to approximately 1 at sample CLP A. 
The slope of G is again approximated as in case (a) above. 
The flow-chart for the transient removal and output program 

is shown in Fig. 9. The data required for removing transients is put 
in order before output begins so that it is readily accessible during 
output. The locations of the positive peaks in segments of the type 
m- 1 and m+ 71 are selected and stored separately. The starting value 
for G and its slope are then computed for all junctions of retained 
segments. These values are stored sequentially together with the infor- 


mation whether 1, is greater or less than I for each case. 


Beil 


The diagram of core usage during output is shown in Fig. 10. 


-1 


Three processes proceed simultaneously during output. They 

are: 1) reading the stored speech samples from disk into core, 
2) removal of transients and 3) D/A conversion of the samples of com- 
pressed speech at 5 KHz. Large blocks of data are read from the disk 
and stored temporarily in the computer core. As the digitized samples 
are being converted into analog voltages the desired number of samples 
are skipped and amplitude smoothing is done at the junctions. 

Since a number of samples are skipped during output the rate 
of reading speech samples from the disk must be considerably higher than 
the rate of D/A conversion. A low rate of 5 KHz of D/A conversion has 
been necessitated by the limit of data transfer rate from disk to core. 


This is a hardware limitation. Advantage is, however, taken of this 
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FIG. 9 FLOW-CHART OF TRANSIENT REMOVAL 
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fact and the available time is utilized in removing transients. The 
amount of compression obtainable is restricted because at large com- 
pressions the D/A conversion tends to go faster than the data trans- 
fer rate from disk. 

An alternative scheme would be to write the speech samples 
back on disk after compression and transient removal. The samples 
could then be read again into core and D/A converted at 10 KHz to give 
compressed speech. This scheme would require more computer time but 
there is no restriction on the amount of compression. 

A zero order hold circuit is present at the output of the 
D/A converter. This circuit holds the level of the converted analog 
voltage constant until the A/D conversion of the next speech sample. 
The stair-case waveform thus obtained is smoothed by passing it through 
empand- pass... lLtereutth, alpass band from 40, Uz to.2.5 kuz. Thissspeech 
is recorded on an ordinary tape recorder and played back at twice the 
speed to restore the frequency spectrum. 

All the source programs are listed in Appendix B including 


a program for the alternative scheme for output mentioned above. 
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CHAPTER V 


CONCLUSIONS 


A selective method of speech compression has been developed 
by taking full advantage of the capabilities of a small inexpensive 
computer. It has been domonstrated that a computer with a small 
capacity main storage and a backing store can be used as a practical 
speech compressor. 

Selective or differential compression has distinct advantages 
over a method in which portions of speech are discarded periodically. 
In the selective method reported here there is very little probability 
that portions of speech essential to its intelligibility will be re- 
moved. Compression is achieved by shortening the long inter-phrase 
pauses and discarding fractions of those speech sounds which have sig- 
nal redundancy. Different types of speech sounds are thus treated 
separately. 

On the present system with 4K words of core capacity and a 
disk memory of 32K words only short sentences have been compressed. 

The method, however, is attractive for compression of continuous spoken 
passages if a large capacity high speed disk or tape unit is connected 
to the computer. 

The compression scheme used attempts to minimize processing 
time on the computer. The time taken by the decision making program 


is only about one fourth the original duration of the speech signal to 
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be compressed. This is in contrast to other methods of compressing 
speech by a digital computer, such as, compression by synthesis (10) 
or pitch period compression (15). 

Apart from being a practical possibility, the method of 
speech compression reported here is useful as a research tool. It can 
be used for evaluating the effect of several factors on the intelligibil- 
ity of compressed speech by varying these factors independently. It can 
also be used for arriving at a selective rule for compression which 
produces optimum intelligibility for a particular word per minute rate 
of speech. 

A tentative rule for compression, described in chapter III, 
was chosen and the intelligibility of the compressed material was tested. 
Short sentences were compressed to an average of 70% of their original 
duration. The sentences used were from the Harvard Psycho-Acoustic 
Laboratory (2.4.L.) Auditory Test No. 12 (26). Forty-two sentences 
were used for the test, the first two of which are: 

1. How many pennies are there in a nickel? 

2. is there a lot of water in the desert? 

These questions were read by a Canadian male speaker and were 
recorded on tape. Subjects chosen for the listening test were of 
Canadian origin and had never been exposed to compressed speech. They 
were five undergraduate university students, four male and one female. 
The compressed speech was presented to the subjects by means of a loud 
speaker in an ordinary room. Three additional compressed sentences 
were first presented with the original sentences to familiarize the 
subjects with compressed speech. They were then asked to write the 


answers to the forty-two questions which followed. Four of the subjects 
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Si 
answered only one question wrong whereas the fifth answered three in- 
correctly. Although a more comprehensive study of the intelligibility 
of speech compressed by this method is needed the results of this 
preliminary test are encouraging. 

Fairbanks' sampling method of speech compression was simulat- 
ed on the digital computer and compared with the method reported here. 
Informal listening tests showed that selectively compressed speech was 
clearer and free of the low frequency rumble present in Fairbanks' 
method. 

The amount of compression, though dependent on the original 
speech signal and its redundancy, can be changed by varying the para- 
meters of the program. The compression can be increased by increasing 
the tolerance intervals for decision making and by using a rule which 
discards a larger—fraction of a group of similar segments. The total 
compression for connected discourse will generally be greater than 
that for individual sentences because of the presence of inter-sentence 


pauses. 
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APPENDIX A 


* 
ENGLISH PHONEMES 








Phonetic Key Phonetic Key 
Symbol Word Symbol Word 
Simple vowels Plosives 
dl Dae b bad 
al feet d dive 
€ tet g give 
ae bat >) pot 
A but t toy 
fo) not k cat 
re) law 
book Nasal consonants 
u boot m may 
3 priacd n now 
3 Bert i) sing 
Complex vowels Fricatives 
e pain Z zero 
oO go S vision 
aU house V very 
al ce DS that 
ae boy h hat 
IU few f fat 
Semivowels and liquids 8 thing 
j you ip shed 
W we S sat 
iL late Affricatives 
fe rate se church 
dz judge 
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N. Lindgren, "Machine recognition of human language Part I - 
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SPPEND IX B 


/INPUT AND FEATURE EXTRACTION PROGRAM 


NC=7758 
CA=7751 


ZEST OF VARIABLES 

*G 

TALLYs @ 

ZCRSs @ 

NEWs @ NEW SPEECH SAMPLE 

MAXNs @ NEGATIVE MAXIMUM 

MAXPs @ /POSITIVE MAXIMUM 

FLAGs @ 7FLAG FOR SIGN OF PREVIOUS SAMPLE 
TALLY1s @ 

TALLY2s @ 


*29 

JINITIALIZING INSTRUCTIONS 
HLT CLA 
TAD CONS 
DCA MAXLOC 
TAD CNST+1 
DCA TALLY 
TAD CONS+2 
DCA 18 
DCA 1 18 
SZ TALLY 
JMP «-2 
TAD CONS+6 
DCA 10 
TAD CNST+2 
DCA TALLY 1 
TAD CNST 
DCA TALLY2 
TAD CONS*1 
DCA TALLY3 
TAD CONS+3 
DCA ZCRS 
TAD CONS+4 
DCA 15 
TAD CONS+5 
DCA 11 
DCA NEW 
TAD CEADDR~1 
DCA 16 
TAD CADDR~1 
DCA 17 
DCA FLAG 
DCA MAXP 
DCA MAXN 
TAD CONS+*1 


Be Tey 


=—ZA =< . 
ao \rOMansA 


MA ORY AWOLPIOATMS BAWTASS GHA TUARIN 






ery = 
1erveAo 


20 yuniay 40 TelJN oS 


ajgvas 192aS2 Way - G .Ww3U 
NOWEKACL Gy) TSN 8 <MKAM 
MUMIA LT beams ® <F%4M 


tIIM42 ZUOLVSZAG 4h Wore AOD oAay 7) 


age 
SUOTToUNTSYV4. 2VISTIATT IRIS . 
AL) TH - 
2uoo dey 
On AM ADG 
Lad eM CAF 
YAJAT AQO 
Ss A085 Gat 
e400: ' 
Oi I ASO 
VIASAT Se! : 





DCA TALLY 

6121 /SKIP ON PULSE INDICATING 
JM em 1 7START OF SPEECH SIGNAL 
6412 

6414 

6421 

6422 

JMP ADCON 


/SUBROUTINE TO WRITE FIRST BLOCK OF 
/DATA ON DISK 
seer ce (4) 
TAD WCOUNT 
DCA WC 
TAD CONS+6 
DCA CA 
DEAL 
TAD WCOUNT 
DMAW 
aula We (Ody pom E 


ZINSTRUCTIONS FOR READING DECISION MAKING 
7ZAND OUTPUT PROGRAMS FROM DISK 
FETCHs CLA 

TAD CONS+7 

DCA WC 

TAD TABLE 

DCA CA 

DEAL 

TAD CONS+3 

DMAR 

DFSC 

JMP e-1 

DFSE 

Hi 

JMP I CONS+1@ 


/LIST OF CONSTANTS 
CONSs 720@ 
-20 
377 
480 
ar 
1377 
iat. 
- 2480 
22600 
TABLE» 2177 
WCOUNTs 4800 
CNST»s -208 . 
~ 369 
-~2690 
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/SOME MORE VARIABLES 
MAXLOCs 


TALLY 33 


1A4@ 
20@8 
2A 
380 
360 
460 
400 
SAG 
SAG 
620 
600 
700 
70B 


Q 
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MULs @ 
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DFSC 

UMP ee-p) 

DF SE 

HT 

SMP Pl UNI SH 


DCA «+2 
MUY 
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SHL 

4) 

UMP Sig MULE 


SZ BALLY 2 
JMP IN 
TAD CNST 
DCA TALLY2 
Ss@ ZGRS 
1SZ MAXLOC 
TAD MAXN 
CIA 
TAD MAXP 
SMA SZA 
JP se tS 
DCA 1 15 
TAD MAXN 
DCA 1 11 
JMP +4 
DEAMIVEYS 
TAD MAXP 
DOA Gl gait 
DCA MAXP 
DCA MAXN 
USZ eLALLY 
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LOC; 


TAD 


JMP IN 
TADS GI777 
DCA 16 
TAD (€-26 
DCA TALLY 
NEW 
SPA 
JMP 
TAD 
SZA 
JMP 
IAC 
DCA 
JMP 
TAD 
SNA 
Mi 
DCA 
Sys 
Loe 
JMP 
SZ 
Se 
JMP 
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ot+7 
FLAG 
CLA 
o+11 


FLAG 
ot5 
FLAG 
CLA 
et+3 
FLAG 
ACRES 
TALLY 1 
e*+13 
ee ie ys 
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4 Seer 1 


TAD 
DCA 
TAD 
DCA 
TAD 
DCA 
6111 
JMP 
6434 
DCA NEW 
TAD NEW 
DCA 1 190 
6412 
6414 
6421 
6422 
TAD 
SPA 
JMP 
CIA 
TAD 
SMA 
JMP 


GET 
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LOG+1 
WCOUNT 
Alaeyel 


sat 


NEW 
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MAXP 

CLA 
ADCON 
TAD NEW 

DCA MAXP 

TAD 14 

DCA 1 MAXLOC 
JMP ADCON 
TAD MAXN 

SMA CLA 


CJMP LOC+5 
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ZNO 
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NEGATIVE? 


7NO 
YES 


SIGN OF PREVIOUS SAMPLE? 


/ANOTHER ZERO CROSSING 


peitie 168 
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7READ A-~D CONVERTER 


7RESET A-D CONVERTER 


FRESET A= 
Cio a= 


D JNPUT MAX 
D CONVERT D 
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TO CHANNEL 
ONE FLAG 


ZILOGATA ONBUP BP ODI TRV ES MAX. 
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JMP «+5 

TAD NEW 

CIA 

DCA MAXN /NEGATIVE MAXIMUM 
JMP ADCON 
TAD MAXP 

SZA CLA 

JMP ADCON 
TAD 10 

DCA I MAXLOC 
JMP ADCON 
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WRITEs @ 

DF SC 

HET 

DP SE 

HLT 

TAD WCOUNT 

DCA WC 

TAD ©1777 

DCA CA 

TAD 1 16 

DEAL 

CLA 

TANey EP 

DMAW 

JMP J, WRITE 
Gils eal oa tere 


/TRACK ADDRESS ON DISK 

ADDRs @ 
490@ 
4) 
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ABGO 
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4) 
4B00 
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VALUE Y 
*20 

Nis @ 
NiPl» 
NiP2s 
N2s @ 
NOP 15 
N2@P2s 
N3s @ 
N3P 13 
N3P2>s 
N4s @ 
N4P 1s 
Cs @ 
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OF VARIABLES 
/PAGE @ 
/SUBSCRIPT FOR SOUND INTENSITY 


“SUBSCRIPT FOR NOs OF 
g 7ZERO CROSSINGS 


/SUBSCRIPT FOR WAVEFORM ASYMMETRY 


4) 


SILENCs @ /THRESHOLD FOR SILENCE 


TRSDs 


LIMs @ 


Tis @ 

Tes 6B 

FLAGAs 
FLAGI » 
FLAGZs 
FLAGSs 
FLAGV>s 
ADDR» 

TALLY» 
TALLY 1 


4) “THRESHOLD FOR ASYMMETRY 


/TOLERANCE FOR INTENSITY 
/TOLERANCE FOR ZERO CROSSINGS 
/GUTPUT PARAMETERS OF SUBROUTINES 


/FLAG FOR SILENCE 
7FLAG FOR VOICING 


Qe aosgd 


4) 
Q 
» @ 


TALLY2s @ 


STOREs 
AVGAs 
AVGI » 
AVGZ>» 
COUNTs 
TEMP s 


MAXs 6 


1Vs @ 


Pr sae 
IMTs @ 
XPTs @ 
XMT»s @ 


TEMP 1 >» 
TEMP2» 


LOCs @ 
Gets. 3 


BEGSs 
PICKs 
ONCE>s 
CODE» 
NCODE»s 
EPICKs 
ENDS» 
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Q 
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*EOAD 
/INITIALIZING INSTRUCTIONS 
PNT sy CLD CMA /GIVE INITIAL VALUES TO 
DCA TALLY1 /VARIABLES 
DCA FLAGS 
De OCS fo. 
DCA N38 
TAD (1962 
DCA N&P1 
TAD €1001 
DCA N3P2 
DCA FLAGA 
TADS Cis i 
DCA N1 
TAD €146@ 
DCA NiP1 
TAD €14@1 
DCA N1P2 
DCA FLAGI 
ee C317. 
DCA N2 
TAD C406 
DCA N2@P 1 
TAD €461 
DCA N2P2 
DCA FLAGZ 
CLA CMA 
DCA ONCE 
JMP GO 


/MAIN CONTROL PROGRAM 
MAINs ISZ C 
SV WAN ba Ney /F INI SHED? 
SKP 7NO 
JMP 3268 AAS 
Dee Coles tas 
TAD STORE 
SMA SZA CLA 
JMP «+6 
i Wes vaie 1 WS) BA ee | 
Side 
JIMS FIX 7FIX THRESHOLDS EVERY 
TAD ¢-12 7TEN SEGMENTS 
DCA TALLY 1 
1SzZ N1 /JINCREMENT SUBSCRIPTS 
sz NiPd 
ova 
132. Nero) 
TAD I NI /CHECK FOR SILENCE 
TAD SILENC 
SMZ SZA CLA 
Mie OU, /NOT SILENCE 
TAD 1 NIP 1 
TAD SILENC 
SMA SZA CLA 
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IMP 
TAD 
TAD 
SMA 
JMP 
TAD 
TAD 
SMA 
JMP 
TAD 
SZA 
JMP 
TSZ 
SKP 
JMP 
TAD 
DCA 
CMA 
TAD 
CIA 
TAD 
SNA 
JMP 
CMA 
TAD 
DCA 
TAD 
DCA 
LAC 
DCA 
JIMP 


OUT 
1 N2 
C326 


SZA CLA 


QUT 
I N2P1 
(e290 


SZA CLA 


OUT 
FLAGS 
CLA 


OUT» TAD FLAGS 


SNA 
JMP 
DCA 
TAD 
DCA 
LAC 
DCA 
JMP 

MS 
JIMS 
JMS 


CLA 
ot+7 
FLAGS 
C 

I 19 


Til 1 
CONT 
ASYM 
INTNS 
ZEROX 


DCIDEs TAD FLAGA 


TAD 
TAD 
TAD 
SPA 
JMP 
TAD 
DCA 
JMP 


*2400 
SUBROUTINE 


FLAGI 
FLAGZ 
(si3 
CLA 
BRK 
NCODE 
CODE 
MAIN 


“PREVIOUS SEGMENT SILENCE? 


ea GNSS 


“NOs REGISTER BREAK 


7NOTS SILENCE 

7/PREVIOUS SEGMENT SILENCE? 
ZNO 

TLS 

/REGISTER BREAK 


/TAG GROUP AS SILENCE 


7ZEXECUTE SUBROUTINES 


/DECIDE 


JAREO SEGMENTS SIMILAR? 
/NOs REGISTER BREAK 
ZI ES 
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ASYMs @ 

15Z 
ISZ 
SZ 
TAD 
SZA 
JMP 
TAD 
JMP 
TAD 
TAD 
ASR 
Q 

DCA 
TAD 
DCA 
TAD 
SPA 
JMP 
CIA 
TAD 
SMA 
JMP 
JMP 
TAD 
SMA 
JMP 
IAC 
DCA 


N3 
NSP 1 
N3P2 
FLAGA 
CLA 
et+-3 

1 N3 
et5 
AVGA 
I NZ 


AVGA 
Ge2 
PABLY2 
AVGA 


eotG 


TRSD 
SZA CLA 
TWO 

ONE 
TRSD 
SZA CLA 
TWO 


FLAGV 


THREEs TAD 1 NSP 1 


SPA 
JMP 
CIA 
TAD 
SMA 
JMP 
JMP 
TAD 
SPA 
JMP 
TAD 
SMA 
JMP 
TAD 
SMA 
JMP 
JMP 


ot6 


TRSD 
SZA CLA 
o+5 
BRKA 
TRSD 
CLA 
CONTA 

I N3P2 
SZA 
BRKA 
TRSD 
SZA CLA 
BRKA 
CONTA 


TWOs DCA FLAGV 


TAD 
SPA 
JMP 
CIA 
TAD 
SMA 
JMP 


I N3P 1 
ot6 
TRSD 


SZA CLA 
CONTA 


= a5 lt 


ZINCREMENT SUBSCRIPTS 


/FIND AVERAGE IF SAME GROUP 


Fis ASYMMETRY POSITIVE? 
7NO 
ify) BS) 


/GREATER THAN OR EQUAL TO TRSD? 


/NO 
JYES 


/SETUPIEAG SPOR SVOTCING 
/ WAVEFORM HAS NEGATIVE ASYMMETRY 
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JMP ot4 
TAD TRSD 
SMA SZA CLA 
JMP CONTA 
TAD I N3P2 
TsZ, TAuLY2 
JMP TWO+2 
JMP BRKA 
ONEs CLA IAC 7'WAVEFORM HAS POSITIVE ASYMMETRY 
DCA FLAGV /SET FLAG FOR VOICING 
TAD I N3P1 
SMA SZA 
JMP #5 
TAD TRSD 
SMA SZA CLA 
JMP o+6 
JMP BRKA 
CIA 
TAD TRSD 
SPA CLA 
JMP CONTA 
TAD I N3P2 
SPA 
JMP BRKA 
CIA 
TAD TRSD 
SMA SZA CLA 
JMP BRKA 
CONTAs CLA IAC 7SIMILAR IN ASYMMETRY 
DCA FLAGA 
JMP I ASYM 
BRKA»s CLA ZUNSIMILAR IN ASYMMETRY 
DCA FLAGA 
JMP I ASYM 


7PARTS OF MAIN CONTROL PROGRAM 

CONTs 15Z N3 /SEGMENT WAS SILENCE 
SZ sNGPH /JINCREMENT SUBSCRIPTS 
Lot Wark 2 
MSZ Nike 
RSZaNerPe 
TAD NCODE 
DCA CODE 
JMP MAIN 

BRKs TAD C /SEGMENTS UNSIMILAR 
DCA 1 10 /RECORD BREAK 
TAD CODE /RECORD TYPE OF SOUND 
DCA I 11 
TAD NCODE 
DCA CODE 
JMP MAIN 


ZINITIALIZING INSTRUCTIONS 


GO,» TAD N1 
DCA STORE 
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DCA 
JiMpP 


CODE 
MAIN 


/UTILITY SUBROUTINE 


CHANGE»s @ 
TAD 
DCA 
13Z 
JMP 
JMP 


k2608 

/SUBROUTINE 

FIXs @ 
TAD 
DCA 
DCA 
TAD 
DCA 
CLA 
ISZ 
SKP 
JMP 
TAD 
CIA 
TAD 
SPA 
JMP 
TAD 
DCA 
JMP 
TAD 
CLL 
Sau 
IAC 
CLL 
SZ 
1AC 
DCA 
TAD 


Cul 


sya 
LAC 
DCA 
TAD 
CLL 
oZL 
LAC 


“SHIFT DATA IN CORE 
I 10 
1Sri 
TALLY 
e233 
I CHANGE 


FOR SETTING THRESHOLDS 


(-40 
COUNT 


COUNT 
e+ il 


MAX 7FIND MAXIMUM INTENSITY IN 
/NEXT 32 SEGMENTS 


lV /VOICED-UNVOICED THRESHOLD 


TRSD 7THRESHOLD FOR ASYMMETRY 
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DCA 
TAD 
CLL 
yal le 
IAC 
DCA 
TAD 
CEL 
SZL 
1AC 
TAD 
CIA 
DCA 
TAD 
TAD 
DCA 
JMP 


/ SUBROUTINE 
INTNSs @ 
PSZ 
TAD 
SZA 
JMP 
TAD 
JMP 
TAD 
TAD 
CEE 
SyAk 
1AC 
DCA 
TAD 
CEL 
SAE 
1AC 
CLL 
SZ 
LAC 
CLL 
SZL 
LAC 
DCA 
TAD 
CIA 
TAD 
SMA 
JMP 
TAD 
DCA 
TAD 
TAD 
DCA 
TAD 
CIA 


SILENC 
SILENC 
RAR 


LIM “MININMUM TOLERANCE FOR 
LIM /COMPARING INTENSITY 
RAR 


SILENC 


SILENC /7THRESHOLD FOR SILENCE 
STORE 

Cre 

STORE 

ere xX 
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NiPe2 
FLAGI /FIND AVERAGE IF SAME GROUP 
CLA 
ot3 
I N1 
ot 
AVGI 
LN 
RAR 


AVGI 
AVGI 
RAR 


RAR 
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Tl /TOLERANCE FOR INTENSITY 


Beit /AINTENSITY PLUS TOLERANCE 
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TAD 
DCA 
TAD 
CIA 
TAD 
SPA 
JMP 
TAD 
CIA 
TAD 
SMA 
JMP 
TAD 
C1A 
TAD 
SPA 
JMP 
TAD 
CIA 
TAD 
SPA 
JMP 
CLA 
TAD 
DCA 
JMP 


CONTI s 


BRKI » 
JMP 


AVGI 
IMT 
iN iP 


Pea 
CLA 
ot6G 
IMT 


Pe iP 
CLA 
CONTI 
I NiPe 


igege 
CLA 
BRKI 
IMT 


Ne 
CLA 
BRKI 


C2 
FLAGI 
EN TINS 


DCA FLAG] 


I INTNS 


JUTILITY SUBROUTINE 


LETs @ 
TAD 
DCA 
TAD 
DCA 
JMP 
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/SUBROUTINE 
/CROSSINGS 
ZEROXs @ 
oz 
TAD 
SZA 
JIMP 
TAD 
arr 
TAD 
TAD 
CEL 
SLs 
1AC 
DCA 
TAD 
CIEE. 


€5679 
BEGS 
(-44 
TALLY 
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/INTENSITY MINUS TOLERANCE 
JINTENSITY OF NEXT SEGMENT 


/LESS THAN OR EQUAL 
/NO 
Cecies 


TOeir tT? 


7GREATER THAN OR EQUAL TO IMT? 
LES 
JNOs 


CHECK SEGMENT NEXT+1 


ASIMLEAR IN INTENSITY 


7ZUNSIMILAR IN INTENSLTY 


FOR COMPARING NUMBER OF ZERO 


N2P2 
FLAGZ 
CLA 
et3 
aN 
ot+6& 
AVGZ 
Le 
RAR 


AVGZ 
AVGZ 
RAR 


7FIND AVERAGE IF SAME GROUP 
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SZL ass 
LAC 

CLL RAR 

SVAE 

IAC 

SNA 

IAC 

DCA T2 7TOLERANCE FOR ZERO CROSSINGS 
TAD AVGZ 

TAD Te2 

DCA XPT ZERO CROSSINGS PLUS TOLERANCE 
TAD T2 

Cla 

TAD AVGZ 

DCA XMT 7ZERO CROSSINGS MINUS TOLERANCE 
TAD AVGI /CABEL THE SEGMENT AS “VOWEL<LIKE* 
CIA ZaoP RA GCATUVE’s OR OV PLOSIVEA" 
(Ey) 

SPA CLA 

JMP et] 

TAD FLAGV 

SZA CLA 

JMP e+16 

TAD AVGZ 

TAD ¢€=-55 

SPA CLA 

JMP +11 

TAD I Ne@P} 

TAD (€-36 

SPA CLA 

JMP e%#5 

TAD (2 /“FRICATIVE’ 

DCA NCODE 

TAD ¢€3 

JMP CONTZ+1 

TAD ¢€3 TPEOSINV ES 

DCA NCODE /°VOWEL“-LIKE" 

TAD I N2@P1 /# ZERO CROSSINGS IN NEXT SEGMENT 
CIA 

TAD XPT 

SPA CLA /LESS THAN OR EQUAL TO XPT? 
JMP o+6 ZNO 

TAD XMT AGO 

CIA 

TAD I NeP1 

SMA CLA /GREATER THAN OR EQUAL TO XMT? 
JMP CONTZ /YES 

TAD 1 NePe2 JNOs CHECK SEGMENT NEXT+#1 

CIA 

TAD XPT 

SPA CLA 

JMP BRKZ 

TAD XMT 

CIA 

TAD 1 N2P1 
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SPA CLA 
JMUPABRKZ 

CONTZs CLA IAC 7SIMILAR IN NUMBER OF 
DCA FLAGZ 7ZERQ CROSSINGS 
IVROL¢ZEROX 

BRKZs DCA FLAGZ 7UNSIMILAR IN NUMBER OF 
JMROLEZCEROX 2220 CROSSINGS 


/PART OF FINAL DECISION MAKING PROGRAM 
Sy bry (63 Bye 7 SILENCE 
TAD 1 TEMP1 
CIA 
TAD I TEMP2 
TAD ¢(-5 
SPA 
JMP IN 
TAD (-4 
SPA 
JMP DO 
TAD (€=-3 
SPAS CLA 
JMP +14 
TAD CA 
TAD 1 TEMP1 
DCAP1la46 
TAD ¢€-4 
TAD 1 TEMP2 
DEAR I «37 
JMP IN 
TADACS 
TAD I TEMP! 
DCA 1 16 
TADEC=3 
JMP DO+5 
DOs CLA 
TADE C2 
TAD 1 TEMP! 
DEAR I WtsG 
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